Background: Analyzing complex medical data requires specialized knowledge and expertise, making it both time-consuming and resource-intensive. Large language models (LLMs), such as GPT-4, excel in tasks like coding and medical statistics. However, analyzing datasets is more intricate than interacting with a chatbot. It involves several critical steps: planning, tracking information, locating data, and developing and refining the right statistical analyses.
Artificial intelligence (AI) agents represent a new trend, where each AI agent can perform specific tasks based on prior defined instructions. We have developed a framework in which multiple AI agents collaborate to accomplish a specific task. In this framework, each agent is assigned a distinct role in the data analysis process. They communicate by sending and receiving messages to coordinate their efforts, ensuring the task is completed in a systematic yet collaborative manner.
Methods: We included the latest 20 studies from the Center for International Blood and Marrow Transplant Research (CIBMTR) with publicly available data. The primary objective was to evaluate the accuracy of AI agents in replicating the primary outcomes of these studies. Using the AutoGen platform, a six-party AI agent framework was constructed. This framework included a user proxy, planner, data retriever, data cleaner, coder, and results reviewer, with GPT-4o serving as the underlying LLM. It was given simple instructions to replicate the primary outcomes, such as “compare overall survival based on different reduced-intensity conditioning regimens.” The results were then compared to the original studies. Each experiment was repeated three times to assure accuracy. The included studies, instructions, and results can be viewed at https://github.com/jwang-580/CIBMTR_data.
Results: The 20 included studies were published between 2021 and 2023, with topics spanning chronic leukemias (20%), health disparities (15%), immunobiology (15%), acute leukemias (10%), lymphomas (10%), graft-versus-host disease (10%), infection (10%), and survivorship (10%). The primary study objectives were either related to survival outcomes (75%) or specific complications (25%), such as the incidence of bacterial and viral infections (5%), pulmonary toxicities (5%), primary graft failure (5%), and secondary malignancies (5%). The statistical methods used for generating primary outcomes were multivariable regression analysis (45%), descriptive statistics (40%), and univariate regression analysis (15%). The AI agents successfully adhered to their designated roles by automatically downloading datasets and data dictionaries, drafting data analysis plans, selecting relevant variables, cleaning datasets, generating and debugging computational analysis codes, and interpreting the results.
The multi-AI agent framework accurately replicated 53% of the primary outcomes (95% confidence interval [CI] 41-66%). This rate was significantly higher than that achieved using ChatGPT alone without the multi-AI agent framework, which replicated 35% of the results (95% CI 24-47%; p=0.04, t-test). Specifically, the multi-AI agent framework correctly replicated 58% of primary outcomes related to survival and 40% related to complications. It also successfully replicated 44%, 71%, and 67% of results from studies using multivariable regression, descriptive statistics, and univariate regression, respectively. The most common cause of failure to achieve accurate results was issues related to data transformation, such as converting time units or selecting data subsets. Notably, hallucination of data or results was not observed due to framework optimizations. The average cost of each analysis, which includes the expenses for processing input and output was $1.2, and each analysis was completed in < 1 minute.
Conclusion: We developed a multi-AI agent framework capable of collaboratively extracting, cleaning, organizing, and analyzing CIBMTR datasets. The agents successfully replicated published results efficiently and cost-effectively. Our approach significantly improves accuracy over the state-of-the-art, non-agent-based AI methods. It has the potential to transform complex data analysis, facilitating major advancements in medical research.
Nazha:Incyte: Current Employment, Current equity holder in publicly-traded company.
This feature is available to Subscribers Only
Sign In or Create an Account Close Modal